{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "prerequisite-weather",
   "metadata": {},
   "source": [
    "# Partial Dependence Plot\n",
    "\n",
    "## Summary\n",
    "\n",
    "Partial dependence plots visualize the dependence between the response and a set of target features (usually one or two), marginalizing over all the other features. For a perturbation-based interpretability method, it is relatively quick. PDP assumes independence between the features, and can be misleading interpretability-wise when this is not met (e.g. when the model has many high order interactions).\n",
    "\n",
    "## How it Works\n",
    "\n",
    "The PDP module for `scikit-learn` {cite}`pedregosa2011scikit` provides a succinct description of the algorithm [here](https://scikit-learn.org/stable/modules/partial_dependence.html).\n",
    "\n",
    "Christoph Molnar's \"Interpretable Machine Learning\" e-book {cite}`molnar2020interpretable` has an excellent overview on partial dependence that can be found [here](https://christophm.github.io/interpretable-ml-book/pdp.html).\n",
    "\n",
    "The conceiving paper \"Greedy Function Approximation: A Gradient Boosting Machine\" {cite}`friedman2001greedy` provides a good motivation and definition."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "human-rally",
   "metadata": {},
   "source": [
    "## Code Example\n",
    "\n",
    "The following code will train a blackbox pipeline for the breast cancer dataset. Aftewards it will interpret the pipeline and its decisions with Partial Dependence Plots. The visualizations provided will be for global explanations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "quantitative-sarah",
   "metadata": {},
   "outputs": [],
   "source": [
    "from interpret import set_visualize_provider\n",
    "from interpret.provider import InlineProvider\n",
    "set_visualize_provider(InlineProvider())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "atomic-words",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from sklearn.datasets import load_breast_cancer\n",
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "from sklearn.ensemble import RandomForestClassifier\n",
    "from sklearn.decomposition import PCA\n",
    "from sklearn.pipeline import Pipeline\n",
    "\n",
    "from interpret import show\n",
    "from interpret.blackbox import PartialDependence\n",
    "\n",
    "seed = 42\n",
    "np.random.seed(seed)\n",
    "X, y = load_breast_cancer(return_X_y=True, as_frame=True)\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=seed)\n",
    "\n",
    "pca = PCA()\n",
    "rf = RandomForestClassifier(n_estimators=100, n_jobs=-1, random_state=seed)\n",
    "\n",
    "blackbox_model = Pipeline([('pca', pca), ('rf', rf)])\n",
    "blackbox_model.fit(X_train, y_train)\n",
    "\n",
    "pdp = PartialDependence(blackbox_model.predict_proba, data=X_train)\n",
    "pdp_global = pdp.explain_global()\n",
    "\n",
    "show(pdp_global, 0)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "decent-image",
   "metadata": {},
   "source": [
    "## Further Resources\n",
    "\n",
    "- [Paper link to conceiving paper](https://projecteuclid.org/download/pdf_1/euclid.aos/1013203451)\n",
    "- [scikit-learn on their PDP module](https://scikit-learn.org/stable/modules/partial_dependence.html)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "continuous-hundred",
   "metadata": {},
   "source": [
    "## Bibliography\n",
    "\n",
    "```{bibliography} references.bib\n",
    ":style: unsrt\n",
    ":filter: docname in docnames\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dependent-firewall",
   "metadata": {},
   "source": [
    "## API\n",
    "\n",
    "### PartialDependence\n",
    "\n",
    "```{eval-rst}\n",
    ".. autoclass:: interpret.blackbox.PartialDependence\n",
    "   :members:\n",
    "   :inherited-members:\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "modular-exploration",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
